Classification of heterogeneous text data for robust domain-specific language modeling
نویسندگان
چکیده
منابع مشابه
Classification of heterogeneous text data for robust domain-specific language modeling
The robustness of n-gram language models depends on the quality of text data on which they have been trained. The text corpora collected from various resources such as web pages or electronic documents are characterized by many possible topics. In order to build efficient and robust domain-specific language models, it is necessary to separate domain-oriented segments from the large amount of te...
متن کاملA Domain Specific Modeling Language for REA
The Resource-Event-Agent (REA) ontology has its roots in the accounting discipline and was originally developed as a reference framework to conceptualize economic phenomena in an enterprise. In its proposal in 1982, McCarthy already had the vision to facilitate the design of data structures of accounting information systems by means of REA [1]. Since this time the REA model has been further ext...
متن کاملTopic Modeling and Classification of Cyberspace Papers Using Text Mining
The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...
متن کاملBi-Weighting Domain Adaptation for Cross-Language Text Classification
Text classification is widely used in many realworld applications. To obtain satisfied classification performance, most traditional data mining methods require lots of labeled data, which can be costly in terms of both time and human efforts. In reality, there are plenty of such resources in English since it has the largest population in the Internet world, which is not true in many other langu...
متن کاملLanguage Modeling for Multi-Domain Speech-Driven Text Retrieval
We report experimental results associated with speech-driven text retrieval, which facilitates retrieving information in multiple domains with spoken queries. Since users speak contents related to a target collection, we produce language models used for speech recognition based on the target collection, so as to improve both the recognition and retrieval accuracy. Experiments using existing tes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: EURASIP Journal on Audio, Speech, and Music Processing
سال: 2014
ISSN: 1687-4722
DOI: 10.1186/1687-4722-2014-14